Skip to content

[GLUTEN-11485][VL] Fix the race condition in ArrowMemoryPool#11493

Merged
baibaichen merged 2 commits intoapache:mainfrom
marin-ma:fix-arrowpool
Jan 29, 2026
Merged

[GLUTEN-11485][VL] Fix the race condition in ArrowMemoryPool#11493
baibaichen merged 2 commits intoapache:mainfrom
marin-ma:fix-arrowpool

Conversation

@marin-ma
Copy link
Contributor

@marin-ma marin-ma commented Jan 26, 2026

Related issue: #11485

@github-actions github-actions bot added the VELOX label Jan 26, 2026
Comment on lines +333 to +339
if (auto pool = it->second.lock()) {
return pool;
}
arrowPools_.erase(name);
}
auto pool = std::make_shared<ArrowMemoryPool>(
blockListener_.get(), [this, name](arrow::MemoryPool* pool) { this->dropMemoryPool(name); });

auto pool = std::make_shared<ArrowMemoryPool>(blockListener_.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we no longer remove the entries, will there be any memory leak risk caused by the expanding entry list in arrowPools_?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like not an issue as long as the number of keys is not too large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baibaichen mentioned in the issue that we can use a background thread to clean up, but I wonder if that might be too heavy for the current use of the arrow pool, since only shuffle and parquet write create named arrow memory pool.

If necessary, perhaps looping over the weak_ptr each time getOrCreateArrowMemoryPool is called and remove the expired ones. @zhztheplayer What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like not an issue as long as the number of keys is not too large.

I thought it might be up to the usage. Hi @marin-ma, do you think we can skip cleaning up the Arrow pools? In that case, we can simply store shared pointer as the class member.

If necessary, perhaps looping over the weak_ptr each time getOrCreateArrowMemoryPool is called and remove the expired ones.

Otherwise, this might be helpful, but we should also figure out whether it will cause any instability regarding performance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there aren’t many entries in arrowPools_, I think it’s fine to keep them via shared_ptr. A MemoryPool object itself shouldn’t take much memory anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using weak_ptr can help to clarify the ownership of the arrow pool. For example if the arrow pool is used by shuffle A, it should be destroyed together with the shuffle writer A itself. And then the next shuffle B will create a new pool. In this way we can also track the memory allocation status for each shuffle separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marin-ma I meant, is it feasible to use std::unordered_map<std::string, std::shared_ptr<ArrowMemoryPool>> arrowPools_? Only by changing the map value type from std::weak_ptr to std::shared_ptr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in the latest PR's code, the expired map entries won't get removed from map until being replaced by new value of the same key. Is this an intentional change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in the latest PR's code, the expired map entries won't get removed from map until being replaced by new value of the same key. Is this an intentional change?

Yes. In this way it can create new arrow pools for each shuffle writer instance. If changing to std::unordered_map<std::string, std::shared_ptr<ArrowMemoryPool>> arrowPools_, the new shuffle writers will reuse the ones created by the previous shuffle.

Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a valid fix, was just curious about the reason we are still using weak_ptr as the map value, since the expired entries won't be removed from the map anyway.

@marin-ma
Copy link
Contributor Author

@baibaichen Do you have any more comments? Thanks!

@baibaichen
Copy link
Contributor

@baibaichen Do you have any more comments? Thanks!

No. Let's merge to see whether the issue is fixed or not.

@baibaichen baibaichen merged commit 3349e8d into apache:main Jan 29, 2026
58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants